-
Notifications
You must be signed in to change notification settings - Fork 62
Include /var/adm/messages in archived zone logs #9448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
System messages on illumos are logged to `/var/adm/messages`. In the event of a hardware or kernel issue, an archive of these logs can be for understanding the nature of the failure. Currently we archive all Oxide service logs from the global zone and any zones we create, but not `/var/adm/messages`. Update the archiver to include the latter as well.
|
Validated on |
| for zone in oxz_zones { | ||
| let logdir = if zone.global() { | ||
| PathBuf::from("/var/svc/log") | ||
| zone.path.join("var/svc/log") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm fine with this, but just to understand) this is a conditional within the "global zone" case anyway - so behaviorally this is the same as PathBuf::from ("/var/svc/log"), right? Or are there cases where zone.path is something other than / for global zones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this question. If anything I'd worry this would break the tests, because they seem to construct their own Zone with a path in a tempdir. (Are they archiving the current system's /var/adm/messages, with this change?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or are there cases where zone.path is something other than / for global zones?
AFAIK no, except in tests. Prior to this PR none of the tests in this module created a fake global zone.
If anything I'd worry this would break the tests, because they seem to construct their own Zone with a path in a tempdir.
The opposite, without this switching to a relative path at
| let adm_logdir = zone.path.join("var/adm"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misread the diff 🤦 That makes sense.
| &self, | ||
| debug_dir: &DebugDataset, | ||
| logdir: PathBuf, | ||
| log_name_pattern: &str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this is a non-pub function, it's probably worth adding docs for this (e.g., it's a prefix; it interacts with the include_live arguments, it's searching within the logdir path, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added in 6e5ba44
davepacheco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I am hoping to do some significant rework of a lot of this code soon. Some of it mechanical, like renaming the module and creating submodules. But I'm also hoping to make some more substantial changes to rework file collection so that it's easier to add and test support for collecting new files. I can plan to base my work on this branch, if you expect this to land soon, or I can incorporate this feature into the work I plan to do (but that may not finish in time for R18). Either works for me.
| if include_live { | ||
| let pattern = logdir | ||
| .join("*.log*") | ||
| .join(format!("{log_name_pattern}*")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like kind of a dicey interface to me, but I feel like that's true for a lot of the existing pieces here. I'm hoping to rework the way a bunch of the log archival works to make it more testable anyway so this is probably fine for now.
| debug_dir, logdir, "*.log", zone_name, false, | ||
| ) | ||
| .await?; | ||
| if zone.global() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file useful in non-global zones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never had a reason to check /var/adm/messages other than in the global zone so far, on the other hand, there's no harm in capturing them from all zones as they're quite small and they might be useful in the future.
Updated with ddc56e2.
| .await?; | ||
| if zone.global() { | ||
| let adm_logdir = zone.path.join("var/adm"); | ||
| self.archive_logs_from_zone_path( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(probably not suggesting any changes here)
I expect that this will drop the "messages" file alongside all the SMF logs in the debug dataset. In an ideal world, I'd rather we organized these files more clearly (consumers of this information have to know that "messages" is not an SMF service but a different thing). But this is already a problem today. The only real new impact of this PR is that oxlog may wind up reporting these files under the messages service or something strange like that? Fixing this seems beyond the scope of this PR. I'll file an issue for the general problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, today oxlog will filter out the archived messages files
omicron/dev-tools/oxlog/src/lib.rs
Lines 384 to 398 in 650fde5
| const OX_SMF_PREFIXES: [&str; 2] = ["oxide-", "system-illumos-"]; | |
| /// Return true if the provided file name appears to be a valid log file for an | |
| /// Oxide-managed SMF service. | |
| /// | |
| /// Note that this operates on the _file name_. Any leading path components will | |
| /// cause this check to return `false`. | |
| pub fn is_oxide_smf_log_file(filename: impl AsRef<str>) -> bool { | |
| // Log files are named by the SMF services, with the `/` in the FMRI | |
| // translated to a `-`. | |
| let filename = filename.as_ref(); | |
| OX_SMF_PREFIXES | |
| .iter() | |
| .any(|prefix| filename.starts_with(prefix) && filename.contains(".log")) | |
| } |
BRM42220082 # ls -l /pool/ext/3516350f-4b93-4c70-a0ad-5fe1f2547492/crypt/debug/global
total 18
-rw-r--r-- 1 root root 12 Dec 1 19:27 messages.1764616962
-rw-r--r-- 1 root root 12 Dec 1 19:32 oxide-sled-agent:default.log.1764617303
| for zone in oxz_zones { | ||
| let logdir = if zone.global() { | ||
| PathBuf::from("/var/svc/log") | ||
| zone.path.join("var/svc/log") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this question. If anything I'd worry this would break the tests, because they seem to construct their own Zone with a path in a tempdir. (Are they archiving the current system's /var/adm/messages, with this change?)
|
|
||
| let tempdir_path = tempdir.path().as_str().to_string(); | ||
| let global_zone = Zone::from_str(&format!( | ||
| "0:global:running:{tempdir_path}::ipkg:shared" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a separate temporary directory? If I'm reading this right, this test is running the archiver to collect data from this temporary directory. It used to only pretend to have one zone, and now it has two. But wouldn't the first archival delete the first set of log files so that the second one has nothing to collect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good call.
The tests works despite sharing the same directory because the the global zone archiver was looking in var/svc/log and var/adm, vs root/var/svc/log for other zone.
Updated with b386fe2.
| ) | ||
| .await?; | ||
|
|
||
| let adm_logdir = zone.path.join("var/adm"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems not quite right for non-global zones.
If I'm understanding right:
- for the global zone,
zone.path()is/. This is the zone's path and also the zone's root filesystem path. - for the non-global zone,
zone.path()is something else. This is the zone's path, but it's not the root filesystem. That's underroot.
I'd suggest adding up at L1102 or so:
let zone_root = if zone.global() { zone.path().to_owned() } else { zone.path().join("root"); };Then you can just have:
let logdir = zone_root.join("var/svc/log");for both cases. And then here, you can use zone_root.join("/var/adm").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks! Fixed with 1a457af
|
Looks good aside from the comment wrapping! |
| // Archives log files found in `logdir` for zone `zone_name` to the destination debug dataset. | ||
| // | ||
| // `log_name_pattern` should be a glob pattern that matches against file names, e.g., `*.log`, `mylog`. | ||
| // If `include_live` is `true`, this will archive all logs, matching on `{log_name_pattern}*`. | ||
| // If it is `false`, only rotated logs will be archived, matching on `{log_name_pattern}.[0-9]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't notice this before. These lines are over 80 columns. Can you rewrap them? (rustfmt cannot enforce long lines in comments without unstable options, unfortunately.)
|
(Sorry, I had queued up the comment about the wrapped lines yesterday, then submitted a comment on the PR itself thinking it had submitted that comment, but it hadn't. I just now submitted the note I was referring to.) |
System messages on illumos are logged to
/var/adm/messages. In the event of a hardware or kernel issue, an archive of these logs can be for understanding the nature of the failure.Currently we archive all Oxide service logs from the global zone and any zones we create, but not
/var/adm/messages. Update the archiver to include the latter as well.